1 Introduction and objectives

In this hands-on session, we will show how principal component analysis (PCA) is used in common bioinformatics workflows. We will use a toy single-cell RNA-seq (scRNA-seq) dataset and a real NCI microarray data to illustrate it.

The main objectives are the following:

  • Understand the intuition behind PCA.
  • Learn to compute PCA in 2 ways in R: by eigendecomposition of the covariance matrix and using the prcomp function.
  • Perform a case-study using scRNA-seq data.

2 Introduction to PCA

Analyzing tabular datasets, specially high-dimensional data, involves identifying which features carry more information and whether some features can be recovered from other features. Principal component analysis (PCA) is a dimensionality reduction method that transforms high-dimensions data into lower-dimensions while retaining as much information as possible.

We will work tabular data with numerical entries, using an \(n x m\) matrix to represent feature values (columns) for experimental samples (rows).

PCA offers several advantages when analyzing biological data:

  1. Reduce the noise of the dataset: by finding the orthogonal axis that maximize the variance, PCA eliminates noise that may have been introduced during the experiments.
  2. Reduce the redundancy: as we will see below, when analyzing gene expression data, there is a vast degree of redundancy across gene sets. Thus, principal components can be interpreted as a ā€œmetageneā€: a score that summarizes information from a correlated set of genes.
  3. Reduce computational complexity: since we reduce the dimensionality, operations downstream such as clustering will be executed faster.

3 Case-study scRNA-seq data

3.1 Introduction to scRNA-seq

Single-cell RNA-seq allows the gene expression profiling of thousands of cells, one a time. It is often exemplified using the following analogy:

  • Bulk RNA-seq would be analogous to a fruit smoothie, in which all we get is an average taste out of several fruits. Analogously, in bulk RNA-seq all we get is an average transcriptome across a population of cells.
  • scRNA-seq, on the other hand, is analogous to a fruit salad, in which you can taste one fruit at a time. Here, we get a transcriptome for each single cell, so we can understand cell-to-cell variability in gene expression.

3.2 Why use PCA to analyze scRNA-seq data

scRNA-seq is a double-edged sword. On the one hand, scRNA-seq provides unprecedented discriminative power to chart all the cell types and states in a given tissue. This has catalyzed the creation of the Human Cell Atlas (HCA), which aims to classify all cells in the human body using scRNA-seq; and has been coined as ā€œthe periodic table of human cellsā€.

On the other hand, however, scRNA-seq has a high degree of technical variability, which can be introduced at multiple points of the process. Some examples include the following (Definitions extracted from table 1 of this paper):

  • Dissociation artifacts: enzymatic treatment is frequently used to dissociate tissues into single-cells. This can induce cellular stress that can be confounded by the variable of interest. Described in detail in this paper.
  • Doublets: Two cells that are processed together in a reaction volume and receive the same single-cell barcode.
  • Dropouts: Transcripts that are not detected in the final dataset even though the gene is expressed in the cell, leading to false zero values in the expression matrix.

3.3 PCA with scRNA-seq data

As an example, we will create an expression matrix with the following cells:

  • 4 T-cells (identified by high expression of CD3D and CD3E).
  • 3 monocytes (identified by high expression of LYZ and S100A8).
  • 3 naive B-cells (identified by high expression of CD79A, CD79B and BLNK).
  • 2 plasma cells (identified by B-cell and proliferation markers, such as TOP2A or MKI67).
  • 2 poor-quality cells (identified by high mitochondrial expression). If a cell has pores in the membrane due to cell lysis, the cytosolic mRNA will leak out of the cell; but if the diameter of mitochondria is higher than the pores, they will get trapped inside the cell.

In the following image you can visualize the redundancy between CD79A and CD79B. Together, they form the heterodimer CD79; which means that for every molecule of CD79A we need one of CD79B. Under the hood they are the same variable, which should be captured by a single principal component.

Image obtained from this link

3.4 Create and visualize toy dataset

Load packages

library(pheatmap)
library(tidyverse)

We will create a toy dataset that contains the gene expression of each of the cells. Cells are represented as rows and genes as columns.

# Create toy dataset
toy <- data.frame(
  CD3D = c(4, 5, 4, 3, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0),
  CD3E = c(5, 3, 4, 4, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0),
  LYZ = c(0, 0, 0, 0, 5, 7, 6, 0, 0, 0, 0, 0, 0, 0),
  S100A8 = c(0, 0, 0, 0, 5, 7, 6, 0, 0, 0, 0, 0, 0, 0),
  CD79A = c(0, 0, 0, 0, 0, 0, 0, 6, 4, 4, 3, 5, 0, 0),
  CD79B = c(0, 0, 0, 0, 0, 0, 0, 5, 5, 3, 4, 6, 0, 0),
  BLNK = c(0, 0, 0, 0, 0, 0, 0, 6, 4, 5, 5, 4, 0, 0),
  TOP2A = c(0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 12, 9, 0, 0),
  MKI67 = c(0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 11, 10, 0, 0),
  MT_CO1 = c(0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 18, 23),
  MT_ND5 = c(0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 22, 25) 
)
rownames(toy) <- paste("cell", as.character(1:nrow(toy)))
toy <- as.matrix(toy)


# Visualize
pheatmap(toy, cluster_cols = FALSE, cluster_rows = FALSE, angle_col = 45)

3.5 PCA in R: 2 ways

Two ways to perform PCA in R:

  • Eigendecomposition of the covariance matrix.
  • prcomp function.

Regardless of the method, we always need to normalize the data (center and scale). We will do it by using the function scale:

toy_normalized <- scale(toy, center = TRUE, scale = TRUE)
toy_normalized[1:5,1:5]
##              CD3D       CD3E        LYZ     S100A8      CD79A
## cell 1  1.4913495  2.0133218 -0.4974087 -0.4974087 -0.6899925
## cell 2  2.0133218  0.9693771 -0.4974087 -0.4974087 -0.6899925
## cell 3  1.4913495  1.4913495 -0.4974087 -0.4974087 -0.6899925
## cell 4  0.9693771  1.4913495 -0.4974087 -0.4974087 -0.6899925
## cell 5 -0.5965398 -0.5965398  1.4369585  1.4369585 -0.6899925

Now we will compute the exact same thing (PCA) in the 2 ways.

In each case we will be tracking the same three pieces of information:

  • the principal components (pdir), also known as loadings.
  • the variance of the data along each principal component (var).
  • the scores of each data sample in the principal components (score).

3.5.1 Eigendecompostion of the covariance matrix

We will be computing an eigenbasis for the covariance matrix \(\Omega = A^tA\) where \(A\) is the normalized data (centered and scaled). We will use the function eigen.

The eigenvectors of the covariance matrix are the loadings (principal components):

cov_matrix <-  (t(toy_normalized) %*% toy_normalized)
eigen_cov_matrix <- eigen(cov_matrix)

pdir_eigen <- eigen_cov_matrix$vectors
pdir_eigen[1:5,1:5]
##            [,1]        [,2]        [,3]        [,4]      [,5]
## [1,] -0.2229758 -0.50005276 -0.17225849 -0.08980263 0.1673922
## [2,] -0.2229758 -0.50005276 -0.17225849 -0.08980263 0.1673922
## [3,] -0.1704618  0.48788427 -0.31105536 -0.08464816 0.1492861
## [4,] -0.1704618  0.48788427 -0.31105536 -0.08464816 0.1492861
## [5,]  0.4308542 -0.01396106 -0.01460081  0.32621575 0.3088705

The eigenvalues contains the variance of the data along each principal component.

var_eigen <- eigen_cov_matrix$values
head(var_eigen)
## [1] 59.6132013 33.9501945 31.0238660 15.9177700  1.0317758  0.8173653

The scores of each data sample in the principal components can be obtained by multiplying the original matrix with the matrix with the eigenvectors as columns:

scores_eigen <- toy_normalized %*% pdir_eigen
scores_eigen[1:5, 1:5]
##             [,1]      [,2]       [,3]       [,4]        [,5]
## cell 1 -1.700709 -2.277482 -0.7241901 -0.2151054  0.10259334
## cell 2 -1.584322 -2.016468 -0.6342760 -0.1682310  0.01521924
## cell 3 -1.584322 -2.016468 -0.6342760 -0.1682310  0.01521924
## cell 4 -1.467934 -1.755454 -0.5443618 -0.1213565 -0.07215486
## cell 5 -1.312695  1.959136 -1.1183532 -0.1207163 -0.10622519

3.5.2 PCA

Now we will use a stand-alone PCA function provided by R, prcomp, to calculate the principal components. Using scale = TRUE we standardize the variables by converting the mean to zero and the standard deviation to one, directly within the function prcomp .

pca_out <- prcomp(toy, center = TRUE, scale = TRUE)

We can use the following keywords to parse the output instance:

Principal components (loadings): ā€œrotationā€

pdir_pca <- pca_out$rotation
pdir_pca[1:5,1:5]
##               PC1         PC2        PC3         PC4       PC5
## CD3D   -0.2229758 -0.50005276 0.17225849  0.08980263 0.1673922
## CD3E   -0.2229758 -0.50005276 0.17225849  0.08980263 0.1673922
## LYZ    -0.1704618  0.48788427 0.31105536  0.08464816 0.1492861
## S100A8 -0.1704618  0.48788427 0.31105536  0.08464816 0.1492861
## CD79A   0.4308542 -0.01396106 0.01460081 -0.32621575 0.3088705

Standard deviations : ā€œsdevā€

var_pca <- pca_out$sdev ** 2
head(var_pca)
## [1] 4.58563087 2.61155343 2.38645123 1.22444385 0.07936737 0.06287425

Principal components scores: ā€œxā€

scores_pca <- pca_out$x
scores_pca[1:5,1:5]
##              PC1       PC2       PC3       PC4         PC5
## cell 1 -1.700709 -2.277482 0.7241901 0.2151054  0.10259334
## cell 2 -1.584322 -2.016468 0.6342760 0.1682310  0.01521924
## cell 3 -1.584322 -2.016468 0.6342760 0.1682310  0.01521924
## cell 4 -1.467934 -1.755454 0.5443618 0.1213565 -0.07215486
## cell 5 -1.312695  1.959136 1.1183532 0.1207163 -0.10622519

In summary, we obtain the following:

  • Gene loadings: pca_out$rotation
  • Variance associated to each principal component: pca_out$sdev ** 2
  • PC scores: pca_out$x

3.6 Infer dimensionality of the dataset

When using prcomp function can get the proportion of variance explained (PVE) and the cumulative proportion with the summary function (more info in this book):

summary(pca_out)
## Importance of components:
##                           PC1    PC2    PC3    PC4     PC5     PC6     PC7
## Standard deviation     2.1414 1.6160 1.5448 1.1065 0.28172 0.25075 0.19522
## Proportion of Variance 0.4169 0.2374 0.2170 0.1113 0.00722 0.00572 0.00346
## Cumulative Proportion  0.4169 0.6543 0.8712 0.9826 0.98977 0.99548 0.99895
##                            PC8     PC9    PC10      PC11
## Standard deviation     0.09752 0.04135 0.01862 1.383e-16
## Proportion of Variance 0.00086 0.00016 0.00003 0.000e+00
## Cumulative Proportion  0.99981 0.99997 1.00000 1.000e+00

To reduce the dimensionality, we will choose the number of PC that explains most of the variance in the dataset. To this end, we use a so-called elbow plot:

# Calculate percentage of variance explained (PVE):
pve <- pca_out$sdev ** 2 / sum(pca_out$sdev ** 2) * 100
pve <- round(pve, 2)
pve_df <- data.frame(principal_component = 1:length(pve), pve = pve)
pve_gg <- pve_df %>%
  ggplot(aes(principal_component, pve)) +
    geom_point() +
    geom_vline(xintercept = 4.5, linetype = "dashed", color = "darkblue") +
    scale_x_continuous(breaks = 1:length(pve)) +
    labs(x = "Principal Component", y = "Percentage of Variance Explained (%)") +
    theme_bw()
pve_gg

Some people prefer to plot the cumulative proportion of explained variance:

summ_pca <- summary(pca_out)
cum_pct <- summ_pca$importance["Cumulative Proportion", ] * 100
cum_pct_df <- data.frame(principal_component = 1:length(pve), cum_pct = cum_pct)
cum_pct_gg <- cum_pct_df %>%
  ggplot(aes(principal_component, cum_pct)) +
    geom_point() +
    geom_vline(xintercept = 4.5, linetype = "dashed", color = "darkblue") +
    scale_x_continuous(breaks = 1:length(cum_pct)) +
    scale_y_continuous(limits = c(0, 100)) +
    labs(x = "Principal Component", y = "Cumulative Percentage of Variance Explained (%)") +
    theme_bw()
cum_pct_gg

Thus, we conclude that the first 4 PC are significant. Let us express each cell as a vector of the first 4 PC (scores):

toy_reduced <- pca_out$x[, c("PC1", "PC2", "PC3", "PC4")]
pheatmap(toy_reduced, cluster_rows = FALSE, cluster_cols = FALSE, angle_col = 45)

3.7 Visualize gene loadings

Now we have reduced the dimensionality of the dataset. To interpret each principal component, let us inspect the gene loadings:

gene_loads_reduced <- pca_out$rotation[, c("PC1", "PC2", "PC3", "PC4")]
significant_pcs <- c("PC1", "PC2", "PC3", "PC4")
loadings_gg <- map(significant_pcs, function(x) {
  loadings <- gene_loads_reduced[, x]
  df <- data.frame(gene = names(loadings), score = loadings)
  p <- df %>%
    ggplot(aes(fct_reorder(gene, score), score)) +
      geom_point() +
      labs(x = "", y = x) +
      theme_bw() +
      coord_flip()
  return(p)
})
loadings_gg
## [[1]]

## 
## [[2]]

## 
## [[3]]

## 
## [[4]]

3.8 Cluster cells

Clustering means classifying observations into groups that minimize and maximize intra-group and inter-group distances, respectively.

3.8.1 Calculate all pairwise Euclidean distances

To that end we need a concept of distance, which basically measures how similar or different two vectors are. In our case we will use Euclidean distance, but be aware that there are a plethora of different options that can yield different clustering results.

Let us start by computing all pairwise distances (distance between cells, considering their expression profiles):

dist_mat <- dist(toy_reduced, method = "euclidean")
pheatmap(dist_mat, cluster_rows = FALSE, cluster_cols = FALSE)

3.9 Perform hierarchical clustering

hclust_average <- hclust(dist_mat, method = "average")
plot(
  hclust_average,
  labels = rownames(toy),
  main = "Average Linkage",
  xlab = "",
  sub = "",
  ylab = "Cophenetic distance"
)

3.9.1 Cut the dendrogram and visualize clusters

plot(
  hclust_average,
  labels = rownames(toy),
  main = "Average Linkage",
  xlab = "",
  sub = "",
  ylab = "Cophenetic distance"
)
abline(h = 2.5, col = "red")

Visualize the correspondent cluster to each of the cells:

hc_clusters <- cutree(hclust_average, h = 2.5)
hc_clusters
##  cell 1  cell 2  cell 3  cell 4  cell 5  cell 6  cell 7  cell 8  cell 9 cell 10 
##       1       1       1       1       2       2       2       3       3       3 
## cell 11 cell 12 cell 13 cell 14 
##       4       4       5       5

Visualize the number of cells in each cluster:

table(hc_clusters)
## hc_clusters
## 1 2 3 4 5 
## 4 3 3 2 2

3.9.2 Annotation

Cluster Cell type
1 T-cells
2 Monocytes
3 Naive B-cells
4 Plasma Cells
5 poor-quality cells

Visualize the centered gene expression across cells and their corresponding cluster:

annotation_rows <- data.frame(cluster = as.character(hc_clusters))
rownames(annotation_rows) <- names(hc_clusters)
annotated_clusters <- c("T-cells", "Monocytes", "Naive B-cells", "Plasma Cells", "poor-quality cells")
names(annotated_clusters) <- as.character(1:5)
annotation_rows$annotation <- annotated_clusters[annotation_rows$cluster]
pheatmap(
  toy_normalized,
  cluster_rows = FALSE,
  cluster_cols = FALSE,
  angle_col = 315,
  annotation_row = annotation_rows
)

4 Exercise microarray data

We will use the data from NCI 60 Data from the ISLR package. This data set contains NCI microarray data. The data contains expression levels on 6830 genes from 64 cancer cell lines.

Exercise

Find the principal components of this dataset and cluster the cell lines according to their similarities in gene expression.

We will load the data and keep the information of the table and the tumor types of the cell lines:

library(ISLR)
data_NCI60 <- NCI60$data
labels_NCI60 <- NCI60$labs
rownames(data_NCI60) <- labels_NCI60

We can look at the dimension of the data set which contains information about 64 cell lines of different tumor types and 6830 genes.

dim(data_NCI60)
## [1]   64 6830

We can see the list of the tumor types:

labels_NCI60
##  [1] "CNS"         "CNS"         "CNS"         "RENAL"       "BREAST"     
##  [6] "CNS"         "CNS"         "BREAST"      "NSCLC"       "NSCLC"      
## [11] "RENAL"       "RENAL"       "RENAL"       "RENAL"       "RENAL"      
## [16] "RENAL"       "RENAL"       "BREAST"      "NSCLC"       "RENAL"      
## [21] "UNKNOWN"     "OVARIAN"     "MELANOMA"    "PROSTATE"    "OVARIAN"    
## [26] "OVARIAN"     "OVARIAN"     "OVARIAN"     "OVARIAN"     "PROSTATE"   
## [31] "NSCLC"       "NSCLC"       "NSCLC"       "LEUKEMIA"    "K562B-repro"
## [36] "K562A-repro" "LEUKEMIA"    "LEUKEMIA"    "LEUKEMIA"    "LEUKEMIA"   
## [41] "LEUKEMIA"    "COLON"       "COLON"       "COLON"       "COLON"      
## [46] "COLON"       "COLON"       "COLON"       "MCF7A-repro" "BREAST"     
## [51] "MCF7D-repro" "BREAST"      "NSCLC"       "NSCLC"       "NSCLC"      
## [56] "MELANOMA"    "BREAST"      "BREAST"      "MELANOMA"    "MELANOMA"   
## [61] "MELANOMA"    "MELANOMA"    "MELANOMA"    "MELANOMA"

Solution

We will use the function prcomp to calculate the principal components.

4.1 Perform PCA

pr_output <- prcomp(data_NCI60, center =TRUE,  scale = TRUE)

As in the previous example, we can calculate the loadings:

pdir_pca <- pr_output$rotation
pdir_pca[1:6,1:6]
##            PC1          PC2         PC3          PC4          PC5          PC6
## 1 -0.010682370  0.001324406 0.008503514 -0.003524094 -0.010126893  0.028903572
## 2 -0.002312078  0.001675266 0.010256593  0.002603645 -0.011400802  0.011242885
## 3 -0.005879750 -0.006289434 0.010055415 -0.010681458  0.010264980  0.018449224
## 4  0.003278071  0.002666138 0.008361513 -0.007475761  0.011248268  0.005553433
## 5 -0.007677535 -0.002508097 0.013820836  0.009509144  0.004094756 -0.002731900
## 6  0.002266671 -0.009677933 0.010818283 -0.012751147 -0.007196820  0.010587841

The standard deviations :

var_pca <- pr_output$sdev ** 2
head(var_pca)
## [1] 775.8157 461.4486 392.8508 290.1080 255.0986 247.1524

And the principal components scores

scores_pca <- pr_output$x
scores_pca[1:6,1:6]
##              PC1       PC2         PC3         PC4        PC5        PC6
## CNS    -19.68245  3.527748  -9.7354382   0.8177816 -12.511081  7.4129037
## CNS    -22.90812  6.390938 -13.3725378  -5.5911088  -7.972471  3.6860385
## CNS    -27.24077  2.445809  -3.5053437   1.3311502 -12.466296 17.2088846
## RENAL  -42.48098 -9.691742  -0.8830921  -3.4180227 -41.938370 27.0251739
## BREAST -54.98387 -5.158121 -20.9291076 -15.7253986 -10.361364 12.8891587
## CNS    -26.96488  6.727122 -21.6422924 -13.7323153   7.934827  0.7073495

4.2 Plot the scores

We will plot the first 2 principal components score vectors with a dot plot, in order to visualize the data. We will color the observations (cell lines) according to their cancer type. In this way we can see to what extent the observations within a cancer type are similar to each other.

scores <- as.data.frame.array(pr_output$x)
scores$tumor_type <- labels_NCI60
scores %>% 
  ggplot(aes(x = PC1, y = PC2, color = labels_NCI60)) +
    geom_point() +
    labs(x = "PC1", y = "PC2") +
    theme_classic()

4.3 Visualize proportion of the variance explained

summary(pr_output)$importance[,c(1:14)]
##                             PC1      PC2      PC3      PC4      PC5      PC6
## Standard deviation     27.85347 21.48136 19.82046 17.03256 15.97181 15.72108
## Proportion of Variance  0.11359  0.06756  0.05752  0.04248  0.03735  0.03619
## Cumulative Proportion   0.11359  0.18115  0.23867  0.28115  0.31850  0.35468
##                             PC7      PC8      PC9     PC10     PC11     PC12
## Standard deviation     14.47145 13.54427 13.14400 12.73860 12.68672 12.15769
## Proportion of Variance  0.03066  0.02686  0.02529  0.02376  0.02357  0.02164
## Cumulative Proportion   0.38534  0.41220  0.43750  0.46126  0.48482  0.50646
##                            PC13     PC14
## Standard deviation     11.83019 11.62554
## Proportion of Variance  0.02049  0.01979
## Cumulative Proportion   0.52695  0.54674

As in the previous example we will represent the cumulative proportion of variance explained:

pve <- pr_output$sdev ** 2 / sum(pr_output$sdev ** 2) * 100
pve <- round(pve, 2)
pve_df <- data.frame(principal_component = 1:length(pve), pve = pve)
summ_pca <- summary(pr_output)
cum_pct <- summ_pca$importance["Cumulative Proportion", ] * 100
cum_pct_df <- data.frame(principal_component = 1:length(pve), cum_pct = cum_pct)
cum_pct_gg <- cum_pct_df %>%
  ggplot(aes(principal_component, cum_pct)) +
    geom_point() +
    scale_x_continuous(breaks = 1:length(cum_pct)) +
    scale_y_continuous(limits = c(0, 100)) +
    labs(x = "Principal Component", y = "Cumulative Percentage of Variance Explained (%)") +
    theme_bw()
cum_pct_gg

4.4 Cluster the samples

In this case, we will select the first 8 PC. We can cluster the samples within the heatmap:

pheatmap(scores[, c("PC1", "PC2", "PC3","PC4","PC5", "PC6", "PC7","PC8")], cluster_rows = TRUE, cluster_cols = FALSE, angle_col = 45, labels_row = labels_NCI60, fontsize_row = 5)

5 Session Information

sessionInfo()
## R version 4.3.1 (2023-06-16)
## Platform: x86_64-pc-linux-gnu (64-bit)
## Running under: Ubuntu 20.04.6 LTS
## 
## Matrix products: default
## BLAS:   /usr/lib/x86_64-linux-gnu/blas/libblas.so.3.9.0 
## LAPACK: /usr/lib/x86_64-linux-gnu/lapack/liblapack.so.3.9.0
## 
## locale:
##  [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C              
##  [3] LC_TIME=es_ES.UTF-8        LC_COLLATE=en_US.UTF-8    
##  [5] LC_MONETARY=es_ES.UTF-8    LC_MESSAGES=en_US.UTF-8   
##  [7] LC_PAPER=es_ES.UTF-8       LC_NAME=C                 
##  [9] LC_ADDRESS=C               LC_TELEPHONE=C            
## [11] LC_MEASUREMENT=es_ES.UTF-8 LC_IDENTIFICATION=C       
## 
## time zone: Europe/Madrid
## tzcode source: system (glibc)
## 
## attached base packages:
## [1] stats     graphics  grDevices utils     datasets  methods   base     
## 
## other attached packages:
##  [1] ISLR_1.4         lubridate_1.9.3  forcats_1.0.0    stringr_1.5.0   
##  [5] dplyr_1.1.3      purrr_1.0.2      readr_2.1.4      tidyr_1.3.0     
##  [9] tibble_3.2.1     ggplot2_3.4.4    tidyverse_2.0.0  pheatmap_1.0.12 
## [13] BiocStyle_2.28.1
## 
## loaded via a namespace (and not attached):
##  [1] sass_0.4.7          utf8_1.2.3          generics_0.1.3     
##  [4] stringi_1.7.12      hms_1.1.3           digest_0.6.33      
##  [7] magrittr_2.0.3      evaluate_0.22       grid_4.3.1         
## [10] timechange_0.2.0    RColorBrewer_1.1-3  bookdown_0.35      
## [13] fastmap_1.1.1       jsonlite_1.8.7      BiocManager_1.30.22
## [16] fansi_1.0.5         scales_1.2.1        jquerylib_0.1.4    
## [19] cli_3.6.1           rlang_1.1.1         munsell_0.5.0      
## [22] withr_2.5.1         cachem_1.0.8        yaml_2.3.7         
## [25] tools_4.3.1         tzdb_0.4.0          colorspace_2.1-0   
## [28] vctrs_0.6.4         R6_2.5.1            lifecycle_1.0.3    
## [31] pkgconfig_2.0.3     pillar_1.9.0        bslib_0.5.1        
## [34] gtable_0.3.4        glue_1.6.2          xfun_0.40          
## [37] tidyselect_1.2.0    rstudioapi_0.15.0   knitr_1.44         
## [40] farver_2.1.1        htmltools_0.5.6.1   rmarkdown_2.25     
## [43] labeling_0.4.3      compiler_4.3.1